We independently review everything we recommend. When you buy through our links, we may earn a commission. Learn more›
Advice, staff picks, mythbusting, and more. Let us help you.
In a recent blog post, we tackled the challenging and controversial task of defining what good sound is—the characteristics that separate great audio devices from merely good ones, and the good from the not so good. In this follow-up article, we’ll discuss how we test audio devices to determine whether they sound good, not only to us but also to you. And we’ll explain how you can use some of the same methods to confidently judge audio equipment on your own.
Below, we cover the different categories one by one. If you have any questions, let us know in the comments section. After all, the number one priority in our testing is to use methods that will help you find audio gear that works best for you.
This is the first in a two-part series in which we discuss how we evaluate good sound in our speaker and headphone reviews.
First, we try to solicit the opinions of multiple people about the audio devices we test so that our guides aren’t overly influenced by one writer’s opinion. Second, we try to make our guides comprehensive by reviewing as many models as we can within a given category and price range. Sometimes these two rules conflict—the more models we test, the more difficult it is to get them all in front of a listening panel—but those are our guiding principles.
For most of our audio guides, the writer starts with an extended solo testing session in which they weed out all the obviously poor performers. This step requires us to make certain presumptions about what our listening panel will and won’t like, but each of us has conducted many dozens of panel tests through which we’ve identified some audio characteristics that are sure to sink a device’s performance: excessive distortion (adding extra harmonic content that isn’t in the recording, which makes tearing or rattling sounds), extreme tonal-balance errors (too much or too little bass, midrange, or treble), and extraneous noise such as hiss and hum.
For our panel tests, we try to narrow down the number of contenders to six or fewer. The more models we add, the harder it is for the listeners to keep mental track of what each one sounds like and how they all compare. This is especially true when we conceal the identities of the models, which we do when we test speakers.
By “conventional speakers,” we mean mostly box-shaped models that usually have at least two drivers: a woofer for the bass and a tweeter for the treble. This general category includes bookshelf speakers, outdoor speakers, computer speakers, and surround-sound speaker systems—and we’ll throw in budget subwoofers and high-performance subwoofers here since our testing methods for them are similar.
Our listening panels may include a mix of Wirecutter staffers, musicians, audio professionals, and audiophiles of our acquaintance. For these tests, we conceal the contenders behind thin, black fabric that has a negligible effect on sound quality. Whenever possible, we use a switching device that allows the panelist to listen to each speaker for as long as they wish and then switch to the next speaker; the speakers are identified only by number, and we change those numbers for every panelist to minimize any advantage a contender might gain by going first or last.
Science shows that this type of unbiased evaluation is absolutely critical to a valid evaluation of sound quality. As detailed in the 1994 Audio Engineering Society paper “Hearing is Believing vs. Believing is Hearing: Blind vs. Sighted Listening Tests, and Other Interesting Things” by Floyd E. Toole and Sean Olive, “when listeners knew what they were listening to, the opinions were dictated more by the product identity than by the sound.”
We try to solicit the opinions of multiple people about the audio devices we test so that our guides aren’t overly influenced by one writer’s opinion.
We match the playback level of the speakers using a shaped noise tone recorded from a Dolby Digital AV receiver; this tone is biased toward the midrange, so it largely eliminates the effect on perceived volume that differences in bass and treble response among speakers would have. Because any speaker that is perceived as being louder is likely to be perceived as being better, we ask the listeners to tell us if they note any volume differences, and we correct those differences as necessary.
As much as possible, we want our panel of listeners to use music and movies that are familiar to them, so we encourage them to play material of their own choosing. If we notice that a listener’s test selections don’t adequately reveal the differences among the models—for example, if they try to test subwoofers with a cello recording or a surround-sound system with an old movie that isn’t in surround sound—we add some of our own test material.
Once the panelists have evaluated all of the speakers and given us their opinions, we reveal the identities and prices, talk them through the features, and ask them which ones they would buy—and if there are any others they would recommend for specific situations or needs. The writer of the guide then crunches all the data and proposes picks to our team’s editors, and as a team we decide on picks that suit the needs of a variety of Wirecutter readers.
As a final gut check, we also run frequency response measurements for many of the conventional speakers we test, particularly those that we name as picks, to make sure they don’t have any glaring technical flaws that our panel of listeners might have missed. With subwoofers, and some conventional speakers, we perform CTA-2010 bass-output measurements that show how loudly a model can play the deep bass frequencies, which are especially challenging for smaller speakers to reproduce.
Our test procedures for headphones and earbuds work somewhat differently because it’s practically impossible to conceal their identities from the listener. Even if we were to blindfold the listener and put the headphones or earbuds on them, they’d still need to adjust the positioning manually—and from the feel of the headphones in their hand and around or in their ears, they’d get some idea of the shape, the size, and possibly the brand. (Some researchers experimented with attaching handles to the headphones so that the listener could adjust them without getting any clues to the design, but they abandoned that practice because the handles could affect the performance.)
Additionally, a factor that’s crucial to the success of any pair of headphones or earbuds is the ergonomics. Part of our evaluation considers whether a tester can get the earcups of the headphones to fit comfortably around their ears, for example, or whether a tester can obtain a good seal using the ear tips included with the earbuds. If the listener can’t get a good fit, air may leak out, and the sound quality probably won’t be what the designers intended. If a panelist is unable to wear the headphones as designed, we take note of what aspects are preventing that (for example, large ear canals or a small head size) so that we can better target our recommendations and take the fit problems into account in our sound-quality discussions.
As with the other audio categories, the writer of the guide starts by weeding out all the obvious underperformers. They may also eliminate models that have basic operational issues, such as a refusal to pair reliably through Bluetooth. After we complete that process, we ask our panel of listeners to evaluate the remaining models one after another, in whatever order they wish and using music of their choosing.
Today’s headphones and earbuds have many features that may affect sound quality, including noise cancellation, preset sound modes, and equalization (EQ) controls to fine-tune the sound. We ask our testers to try these features and particularly to use any EQ or presets to adjust the sound to their preferences. We take the success or failure of this endeavor into account in our overall performance evaluation.
We also advise our testers to adjust the volume so that the levels of the different headphones or earbuds match as closely as possible, but it’s not practical for us to match the levels on headphones and earbuds the way we do for speakers. Although we do possess the lab gear required to measure and match the volume levels, headphone and earbud measurements are inherently imprecise: A slight change of the headphones’ positioning on the test equipment can easily result in a measured level difference of 2 or 3 decibels. And the fact that every listener’s ears and ear canals are shaped somewhat differently from the simulated body parts on the test fixture (which are created based on averages of hundreds of people) may also affect the utility of such measurements.
Once our panelists have shared their opinions of the headphones or earbuds, the writer of the guide consults with our editors to settle on the picks.
Generally we don’t perform frequency response measurements on headphones or earbuds. Speaker measurements are pretty easy to interpret—basically, the flatter the line, the better—but understanding headphone and earphone measurements requires a great deal of expertise. As we stated above, headphone measurements are imprecise, and as demonstrated in this white paper from audio-technology company Sonarworks, the measurements produced by various headphone-measurement devices are inconsistent with one another. However, we do perform measurements of headphones’ noise-cancelling capability, and you can read about that testing methodology in our guide to the best noise-cancelling headphones.
In many ways, soundbars and Bluetooth speakers work similarly to conventional speakers. The difference is that all of the models we test in these categories have built-in amplification, along with digital signal processing (DSP) that tunes the sound of the speaker and may provide other functions, such as bass and treble controls, special listening modes, and surround sound. So there’s usually a bit more for us to consider in our audio evaluations.
With both of these categories, the writer of the guide goes through the same vetting procedure as with speakers or headphones, eliminating the models that are obviously unlikely to win the approval of a listening panel. The writer might also dismiss models that have basic functionality problems, such as soundbars that have trouble connecting to a TV through an HDMI cable, Bluetooth speakers that refuse to pair with the writer’s phone or tablet, or models that require the use of an app but consistently fail to communicate with it properly.
Our listening tests for soundbars and Bluetooth speakers resemble those for conventional speakers. We test soundbars with music and movies; we test Bluetooth speakers with music and occasionally with podcasts. We match the listening levels to the best of our ability, though unfortunately the volume adjustments on these devices tend to be fairly coarse and do not permit the precise level matching we can usually achieve with conventional speakers. As much as possible, we use the soundbars or Bluetooth speakers at their factory-default settings or, in the case of soundbars, in the mode appropriate for the content being played (such as music or movies).
Once our tests are complete, the writer discusses the contenders’ identities, prices, and features with the panelists. The writer may also take this opportunity to demonstrate any noteworthy sound modes a contender may have and, in the case of Bluetooth speakers, crank up the speaker to full volume to show how loud it can play and how clear (or distorted) it sounds at that volume.
We don’t measure frequency response on soundbars and Bluetooth speakers. Many soundbars employ surround-sound simulation technology that often cannot be deactivated and makes it impossible to obtain a useful frequency response measurement. Also, soundbars and Bluetooth speakers incorporate volume limiters that can have a huge effect on the sound—they can make strange sounds, “pump” as they automatically change the volume up and down, or alter the frequency response depending on the volume.
However, we do perform maximum-volume measurements of Bluetooth speakers, using the average playback level they’re able to achieve with loud pink noise (video) and during a 35-second snippet of a recording with heavy dynamic range compression, namely ZZ Top’s “Chartreuse” (video). We also perform CTA-2010 bass-response measurements on the subwoofers included with soundbars, or on the soundbars themselves if they don’t come with subwoofers.
Although we’d like to think that Wirecutter readers can rely entirely on our judgment when it comes to buying audio devices—you could do worse!—it’s wise for anyone who is interested in music and movies to develop their own methods of evaluating audio gear. And it’s really not that hard. You probably can’t audition multiple models simultaneously through listening-panel tests or run frequency response measurements, but you can develop your own testing regimen that allows you to get a decent idea of an audio device’s performance in just a few minutes.
Wirecutter’s audio writers judge audio performance based on the sound attributes we outlined in part one of this series, with test tracks we’ve been using for years. We’ve chosen these tracks because they quickly reveal common audio flaws, and we’ve heard them played through hundreds or even thousands of audio devices. Some of our favorites (all of the following links are to videos) are Holly Cole’s “Train Song,” José González’s “Heartbeats,” the live version of James Taylor’s “Shower the People,” Kanye West’s “Love Lockdown,” and Tracy Chapman’s “Fast Car.”
Once you’ve selected effective tracks and gotten to know them well, you should be able to get a good idea of the quality of a piece of audio equipment after playing three or four tunes.
What you need, at minimum, is a couple of clear vocal tracks (preferably one male and one female), a track with lots of treble energy (such as cymbals, acoustic guitar, or flute), and a track with strong, deep bass. It’s best to use your own downloaded files or files ripped from CDs because a streaming service might at any point swap in a different master for a certain tune. You can get ideas from reading audio reviews and perhaps opinions from other audio enthusiasts—but you should never let anyone insist that you use only their material of choice when evaluating audio gear. The value of using your favorite test tracks is partly in the content of those recordings but also in your familiarity with them.
Once you’ve selected effective tracks and gotten to know them well, you should be able to get a good idea of the quality of a piece of audio equipment after playing three or four tunes. If it’s a device with readily apparent volume limitations, such as a small conventional speaker or a Bluetooth speaker, be sure to crank it up full-blast so you can see how well it handles high volumes.
If you’re testing soundbars or surround-sound systems, keep a couple of good action-movie DVDs or Blu-ray discs on hand, and choose scenes that effectively utilize the whole sound field, including the rear channels and subwoofer. It’s also a good idea to test a dialogue-heavy scene to see how clearly you can understand the vocals. You can Google something like “best Blu-ray test scenes” and get suggestions from home theater enthusiasts and professional reviewers. On those bass-heavy action scenes, be sure to crank up the volume to see how well the speakers handle it and to determine whether the subwoofer can deliver that satisfying couch-shake that makes action movies more fun.
We hope this article answers most of the questions Wirecutter readers have about our methods for testing audio devices, but if you have any more questions, drop them into the comments section below.
This article was edited by Adrienne Maxwell and Grant Clauser.
Everyone loves music, so we researched and tested the best headphones, speakers, and audio gear to give as gifts.
If you’re a musician who’s serious about recording high-quality music, a great USB audio interface like the Focusrite Scarlett 2i2 3rd Gen is a vital tool.
Good sound doesn't have to cost a lot. We found the best audio gear for around $100 or less.
The iFi Audio Zen Air Blue Bluetooth receiver offers excellent signal range and surprisingly good audio performance for the price.
Wirecutter is the product recommendation service from The New York Times. Our journalists combine independent research with (occasionally) over-the-top testing to save people time, energy and money when making buying decisions. Whether it's finding great products or discovering helpful advice, we'll help you get it right (the first time). Subscribe now for unlimited access.
© 2022 Wirecutter, Inc., A New York Times Company